Part 7. Hypothesis Testing

$$ \def\pr{\hbox{Pr}} \def\var{\hbox{var}} \def\cov{\hbox{cov}} \def\tr{\hbox{tr}} \def\corr{\hbox{corr}} \def\dmX{\un{\mathcal{X}}} \def\dmG{\un{\mathcal{G}}} \def\dmK{\un{\mathcal{K}}} \def\dmS{\un{\mathcal{S}}} \def\dmC{\un{\mathcal{C}}} \def\dmZ{\un{\mathcal{Z}}} \def\bma{{\mbox{\boldmath $\alpha$}}} \def\bmb{{\mbox{\boldmath $\beta$}}} \def\bmu{{\mbox{\boldmath $\mu$}}} \def\bme{{\mbox{\boldmath $\epsilon$}}} \def\bmS{{\mbox{\boldmath $\Sigma$}}} \def\bmL{{\mbox{\boldmath $\Lambda$}}} \def\bmd{{\mbox{\boldmath $\delta$}}} \def\bmD{{\mbox{\boldmath $\Delta$}}} \def\bmG{{\mbox{\boldmath $\Gamma$}}} \def\bmphi{{\mbox{\boldmath $\phi$}}} \def\bmPhi{{\mbox{\boldmath $\Phi$}}} \def\bmpsi{{\mbox{\boldmath $\psi$}}} \def\bmtheta{{\mbox{\boldmath $\theta$}}} \def\eq{\begin{equation}} \def\eeq{\end{equation}} \def\i{{\bf i}} \def\un#1{{\bf #1}}$$

Maximum likelihood

Consider a random sample $\un z(i)$, $i=1\dots m$, from a population whose density function is $p(\un z\mid \theta)$. The *likelihood function* for the sample is

$$L(\theta) = \prod_{i=1}^m p(\un z(i)\mid \theta).$$

Its logarithm

$$\mathcal{L}(\theta) = \sum_{i=1}^{m} \log p(\un z(i)\mid \theta),$$

is called the *log-likelihood*. Taking products is justified when the $\un z(i)$ are realizations of independent random vectors. The parameter set $\hat\theta$ which maximizes the likelihood function or its logarithm, i.e., which gives the largest value for all of the realizations, is called the *maximum likelihood estimate* of $\theta$.

For $m$ samples from the multivariate normal distribution:

$$\mathcal{L}(\mu,{\bf\Sigma}) = m{N\over 2}\log 2\pi-m{1\over 2}\log|{\bf\Sigma}|-{1\over 2} \sum_{i=1}^{m} (\un z(i)-\mu)^\top\bf\Sigma^{-1}(\un z(i)-\mu).$$

To maximize with respect to $\mu$ we set

$${\partial\mathcal{L}(\mu,{\bf\Sigma})\over\partial\mu}= \sum_{i=1}^{m} {\bf\Sigma}^{-1}(\un z(i)-\mu) =\un 0,$$

giving $$\hat\mu = {1\over m}\sum_{i=1}^{m} \un z(i),$$ which is the realization $ \bar{\un z}$ of the unbiased estimator for the sample mean.

In a similar way one can show that

$$\hat{\bf\Sigma} = {1\over m}\sum_{i=1}^{m}(\un z(i)-\mu)(\un z(i)-\mu)^\top,$$

which is (almost) the realization $\un s$ of the unbiased sample covariance matrix estimator, except that the denominator $m-1$ is replaced by $m$, a fact which can often be ignored.

Some terminology

A *statistical hypothesis* is a conjecture about the distributions of one or more random variables. It might, for instance, be an assertion about the mean of a distribution, or about the equivalence of the variances of two different distributions.

We distinguishe between *simple* hypotheses, for which the distributions are completely specified, for example: *the mean of a normal distribution with variance $\sigma^2$ is $\mu=0$*, and *composite* hypotheses, for which this is not the case, e.g., *the mean is $\mu\ge 0$*.

In order to test such assertions on the basis of samples of the distributions involved, it is also necessary to formulate *alternative* hypotheses. To distinguish these from the original assertions, the latter are called *null* hypotheses.

Thus we might be interested in testing the simple null hypothesis $\mu = 0$ against the composite alternative hypothesis $\mu\ne 0$.

An appropriate sample function for deciding whether or not to reject the null hypothesis in favor of its alternative is referred to as a *test statistic*.

An appropriate *test procedure* will partition the possible realizations of the test statistic into two subsets: an acceptance region for the null hypothesis and a rejection region. The latter is referred to as the *critical region*.

**Definition:** Referring to the null hypothesis as $H_0$, there are two kinds of errors which can arise from any test procedure:

  1. $H_0$ may be rejected when in fact it is true. This is called an *error of the first kind* and the probability that it will occur is denoted $\alpha$.

  2. $H_0$ may be accepted when in fact it is false, which is called an *error of the second kind* with probability of occurrence $\beta$.

The probability of obtaining a value of the test statistic within the critical region when $H_0$ is true is thus $\alpha$. The probability $\alpha$ is also referred to as the *level of significance* of the test.

It is generally the case that the lower the value of $\alpha$, the higher is the probability $\beta$ of making a second kind error. Traditionally significance levels of 0.01 or 0.05 are used.

Such values are obviously arbitrary, and for exploratory data analysis it is common to avoid specifying them altogether. Instead the *$P$-value* for the test is stated:

**Definition:** Given the observed value of a test statistic, the *$P$-value* is the lowest level of significance at which the null hypothesis could have been rejected.

High $P$-values provide evidence in favor of accepting the null hypothesis, without actually forcing one to commit to a decision.

Example: IAEA Nuclear material accontancy

$H_0:$ No material is missing (simple hypothesis)

$H_1:$ One "sgnificant quantity"is missing (simple hypothesis).

Test statistic:

$$ MUF = Inventory_0 + Input - Output - Inventory_1 $$

which might be assumed to be normally distibuted with mean 0 under $H_0$.

$\alpha$ is the false alarm probability.

$1-\beta$ is the detection probability.

Likelihood ratio test

Consider a random sample $\un z(i)$, $i=1\dots m$, from a population whose density function is $p(\un z\mid \theta)$. Here $\theta$ is the parameter set of the density function, e.g., $\theta = \{\mu,\sigma^2\}$. The likelihood function for the sample is $$L(\theta) = \prod_{i=1}^m p(\un z(i)\mid \theta).$$

**Definition:** Let $\omega$ be space of all possible values of the parameters $\theta$ and $\omega_0$ be a subset of that space. The likelihood ratio test for the null hypothesis $\theta\in\omega_0$ against the alternative $\theta\in\omega-\omega_0$ has the critical region

$${\max_{\theta\in \omega_0} L(\theta) \over \max_{\theta\in\omega} L(\theta)} \le k.$$

This definition simply reflects the fact that, if $H_0$ is true, the maximum likelihood for $\theta$ when restricted to $\omega_0$ should be close to the the maximum likelihood for $\theta$ without that restriction. Therefore, if the likelihood ratio is small, (less than or equal to some small value $k$), then $H_0$ should be rejected.

In many cases the likelihood ratio test will lead to a test statistic whose distribution is unknown. The LRT has, however, an important asymptotic property:

Theorem

If $\omega$ is a region of $R^q$ and $\omega_0$ is an $r$-dimensional subregion, then for each $\theta\in\omega_0$, $-2\log Q$ has an asymptotic chi-square distribution with $q-r$ degrees of freedom as $n\to\infty$.

**Example:** Given random samples $z_1,z_2\dots z_m$ from a normal distribution with mean $\mu$ and known variance $\sigma^2$. The likelihood ratio test at significance level $\alpha$ for the simple hypothesis $ H_0:\ \mu = \mu_0 $ against the alternative composite hypothesis $ H_1:\ \mu \ne \mu_0 $ has the critical region

$${L(\mu_0)\over L(\hat\mu)}\le k,$$

where $\hat\mu$ is the maximizes the likelihood function for all possible values of $\mu$, $$\hat\mu = \bar z = {1\over m}\sum_{i=1}^m z_i.$$ The critical region is then (try to show this)

$${L(\mu_0)\over L(\hat\mu)} = \exp\left(-{1\over 2\sigma^2/m}(\bar z - \mu_0)^2\right) \le k.$$

Equivalently, $$e^{-x^2} \le k,$$ where $$x = {\bar z - \mu_0 \over \sigma/\sqrt{m}}$$ So the critical region can be written in the form $$\left|{\bar z - \mu_0 \over \sigma/\sqrt{m}}\right| \ge k_\alpha.$$ But we know that $\bar z$ is normally distributed with mean $\mu_0$ and variance $\sigma^2/m$. Therefore $x$ has the standard normal distribution $\Phi(x)$ and $k_\alpha$ is determined by $$\Phi(-k_\alpha) + 1- \Phi(k_\alpha) = \alpha,$$ or $$1- \Phi(k_\alpha) = \alpha/2.$$

T and F tests

Suppose however that the variance $\sigma^2$ is unknown and that we wish nevertheless to make a statement about $\mu$ in terms of some realization of an appropriate statistic. To treat this and similar problems it is first necessary to define some additional distribution functions.

Theorem If the random variables $Z_i$, $i=0\dots m$, are independent and standard normally distributed, then the random variable

$$ T = { Z_0\over \sqrt{{1\over m}(Z_1^2+\dots + Z_m^2)} }\tag 7 $$

is Student-t distributed with $m$ degrees of freedom. The corresponding density is given by

$$ p_{t;m}(z) = d_m\left( 1 + {z^2\over m}\right)^{-(m+1)/2},\quad -\inftywith $d_m$ being a normalization factor. The Student-t distribution converges to the standard normal distribution for $m\to\infty$.

The Student-t distribution is used to make statements regarding the mean when the variance is unknown. Thus if $Z_i$, $i=1\dots m$, is a sample from a normal distribution with mean $\mu_0$ and unknown variance, and

$$\bar Z = {1\over m}\sum_{i=1}^m Z_i,\quad S = {1\over m}\sum_i^m (Z_i-\bar Z)^2,$$

then the random variable

$$T = {\bar Z - \mu_0 \over \sqrt{S/m} }$$

is Student-t distributed with $m-1$ degrees of freedom and is a likelihood ratio test for $H_0: \mu=\mu_0$, $H_1: \mu\ne\mu_0$.

If $Z_i$, $i=1\dots n$, and $Y_i$, $i=1\dots m$, are samples from normal distributions with equal variance, then the random variable

$$T_d = {\bar Z-\bar Y - (\mu_Z-\mu_Y)\over \sqrt{\left({1\over n}+{1\over m}\right){(n-1)S_Z+(m-1)S_Y\over n+m-2}}}\tag 9$$

is Student-t distributed with $n+m-2$ degrees of freedom. It may be used to test the hypothesis $\mu_Z=\mu_Y$.

**Definition:** The *F-probability density function* with $m$ and $n$ degrees of freedom is given by

$$ p_{f;m,n}(f) = \cases{ c_{mn} f^{(m-2)/2}\left(1+{m\over n}\ f\right)^{-(m+n)/2} & for $f>0$\cr 0 & otherwise,} $$

with normalization factor $c_{mn}$.

**Theorem:** Given independent and standard normally distributed random variables $Z_i$, $i=1\dots n$ and $Y_i$, $i=1\dots m$, the random variable

$$F = {{1\over m}(Y_1^2+\dots + Y_m^2) \over {1\over n}(Z_1^2+\dots + Z_n^2) }\tag{10}$$

is $F$-distributed with $m$ and $n$ degrees of freedom.

Note that $F$ is the ratio of two chi-square distributed random variables.

The $F$-distribution can be used for hypothesis tests regarding two variances. If $S_Z$ and $S_Y$ are sample variances for samples of size $n$ respectively $m$ drawn from normally distributed populations with variances $\sigma_Z^2$ and $\sigma_Y^2$, then $$F = {\sigma_Y^2S_Z\over \sigma_Z^2S_Y}$$ is a random variable having an $F$-distribution with $n-1$ and $m-1$ degrees of freedom. It is a likelihood ratio test for

$$H_0:\sigma_Z^2 = \sigma_Y^2, \quad H_1: \sigma_Z^2 \ne \sigma_Y^2.$$

Example: Automatic radiometric normalization of satellite imagery

In [1]:
ls
20010525           bayes-1.png                      result_fmap
20010525.hdr       dispms.py                        result_smap
20010525_pca       fig2_4.eps                       sar.jpg
20010525_pca.hdr   fig2_4.png                       sar_seq.py
20010626           greenlandvario.png               scripts.zip
20010626.hdr       iMad.py                          solutions.ipynb
20010829           lorenz.py                        statisticsPt1.ipynb
20010829.hdr       MAD(20010525-20010829)           statisticsPt2.ipynb
20010829_norm      MAD(20010525-20010829).hdr       statisticsPt3.ipynb
20010829_norm.hdr  MAD(20010626-20010829)           statisticsPt4.ipynb
20100426.tif       MAD(20010626-20010829).hdr       statisticsPt5.ipynb
20100520.tif       MILLER_et_al-1999-Tellus_A.pdf*  statisticsPt6.ipynb
20100707.tif       particle filter/                 statisticsPt7.ipynb
20100731.tif       particle_filter.pro              statisticsPt8.ipynb
20100824.tif       pca.py                           statisticsPt9.ipynb
20101011.tif       radcal.py                        subset.py
2dvario.png        registersar.py                   subset.pyc
angebot2016.rtf    result_bmap                      tutorialsar.ipynb
auxil/             result_cmap                      ZFL-Course-Part1.ipynb
In [2]:
%matplotlib inline
%run dispms -f /home/mort/stats2016/20010626 -e 1 -p [4,5,6] \
-F /home/mort/stats2016/20010829 -E 1 -P [4,5,6]
In [3]:
%run iMad /home/mort/stats2016/20010626 /home/mort/stats2016/20010829
------------IRMAD -------------
Wed Oct 26 16:05:25 2016
time1: /home/mort/stats2016/20010626
time2: /home/mort/stats2016/20010829
rho: [ 0.66817403  0.74675512  0.85250628  0.93893862  0.97844374  0.98936266]
result written to: /home/mort/stats2016/MAD(20010626-20010829)
elapsed time: 43.7148160934
In [4]:
run dispms -f /home/mort/stats2016/MAD(20010626-20010829) -e 3 -p [7,7,7] \
-F /home/mort/stats2016/MAD(20010626-20010829) -E 3 -P [4,5,6] 
In [5]:
run radcal /home/mort/stats2016/MAD(20010525-20010829)
Wed Oct 26 16:06:10 2016
reference: /home/mort/stats2016/20010525
target   : /home/mort/stats2016/20010829
no-change probability threshold: 0.95
no-change pixels for training: 1965, for testing: 983
band: 1  slope: 2.337748  intercept: -81.161352  corr: 0.445596  P(t-test): 0.542573  P(F-test): 0.000000
band: 2  slope: 2.542814  intercept: -73.761489  corr: 0.391407  P(t-test): 0.544183  P(F-test): 0.000000
band: 3  slope: 6.453502  intercept: -269.867452  corr: 0.151534  P(t-test): 0.576725  P(F-test): 0.000000
band: 4  slope: 1.162937  intercept: 9.173728  corr: 0.534290  P(t-test): 0.576912  P(F-test): 0.015729
band: 5  slope: 3.906851  intercept: -175.208401  corr: 0.270889  P(t-test): 0.686340  P(F-test): 0.000000
band: 6  slope: 19.469899  intercept: -765.005291  corr: 0.067577  P(t-test): 0.828101  P(F-test): 0.000000
result written to: /home/mort/stats2016/20010829_norm
elapsed time: 0.925208091736
In [6]:
run dispms -f /home/mort/stats2016/20010829 -e 1 -p [4,5,6] \
-F /home/mort/stats2016/20010829_norm -E 1 -P [4,5,6]

ANOVA: More than two populations

The case often arises when one wants to compare the mean values of several random variables, for instance if several medications are equally effective.

Suppose that $X_1, X_2,\dots, X_k$ are normally distributed random variables with means $\mu_1,\mu_2,\dots,\mu_k$ and variance $\sigma^2$. We want to test the null hypothesis

$$ H_0: \quad \mu_1 = \mu_2 = \dots = \mu_k. $$

Let

$$ X_{11},X_{12},\dots,X_{1n_1},\quad X_{21},X_{22},\dots,X_{2n_2},\quad\dots\quad X_{k1},X_{k2},\dots,X_{kn_k} $$

be independent samples drawn from the $k$ distributions. Call this the overall sample, consisting of $k$ partial samples. The total number of samples is

$$ n = \sum_1^k n_i. $$

The overall sample mean and variance are then

$$ \bar X = {1\over n}\sum_{i=1}^k\sum_{j=1}^{n_i} X_{ij},\quad S = {1\over n-1}\sum_{i=1}^k\sum_{j=1}^{n_i}(X_{ij}-\bar X)^2. $$

Similarly for the partial samples,

$$ \bar X_i = {1\over n_i}\sum_{j=1}^{n_i}X_{ij}, \quad S_i= {1\over n_i-1}\sum_{j=1}^{n_i}(X_{ij}-\bar X_i)^2. $$

Now consider

$$\eqalign{ \sum_{j=1}^{n_i}(X_{ij}-\bar X)^2 &= \sum_{j=1}^{n_i}(X_{ij}-\bar X_i + \bar X_i-\bar X)^2 = \sum_{j=1}^{n_i}((X_{ij}-\bar X_i) + (\bar X_i-\bar X))^2 \cr &= \sum_{j=1}^{n_i}(X_{ij}-\bar X_i)^2 + 2(X_i-\bar X)\sum_{j=1}^{n_i}(X_{ij}-\bar X_i) + n_i(\bar X_i-\bar X)^2 \cr &= (n_i-1)S_i + n_i(\bar X_i-\bar X)^2, } $$

since the second term vanishes. Therefore

$$ (n-1)S = \sum_{i=1}^k\sum_{j=1}^{n_i}(X_{ij}-\bar X)^2 = \sum_{i=1}^k (n_i-1)S_i + \sum_{i=1}^k n_i(\bar X_i-\bar X)^2.\tag{11} $$

Thus the overall sample variance can be split into a component that only depends upon the partial sample variances and a component that depends on the square of the distances between the partial sample means and the overall sample mean.

Only the second term in (11) depends upon the partial sample means. So if we consider the ratio

$$ {\sum_i n_i(\bar X_i-\bar X)^2\over \sum_i (n_i-1)S_i} = {\sum_i n_i(\bar X_i-\bar X)^2\over \sum_i\sum_j (X_{ij}-\bar X_i)^2 } $$

large values will indicate that the means are different.

Under the null hypothesis all the $X_{ij}$ have the same (unknown) variance $\sigma^2$ and $\bar X_i$ has variance $\sigma^2/n_i$. Rewriting the above ratio as

$$ {\sum_i {(\bar X_i-\bar X)^2\over \sigma^2/n_i} \over \sum_i\sum_j {(X_{ij}-\bar X_i)^2 \over \sigma^2}} $$

The numerator consists of the sum of the squares of $k$ standard normally distributed random variables, the denominator of $n$ standard normally distributed random variables. So if we construct the ratio

$$ {{1\over k}\sum_i n_i(\bar X_i-\bar X)^2\over {1\over n}\sum_i\sum_j (X_{ij}-\bar X_i)^2 } = {n\sum_i n_i(\bar X_i-\bar X)^2\over k\sum_i\sum_j (X_{ij}-\bar X_i)^2 } $$

and compare with (10) we seem to have a test statistic with the F-distribution with $k$ and $n$ degrees of freedom. In fact, since we are using the estimated means (one estimate in the numerator and $k$ in the denominator), the correct statistic is

$$ {(n-k)\sum_i n_i(\bar X_i-\bar X)^2\over (k-1)\sum_i\sum_j (X_{ij}-\bar X_i)^2 } $$

which is F-distributed with $k-1$ and $n-k$ degrees of freedom.

In [7]:
import numpy as np
from scipy.stats import f
X1 = [21,22,26]
X2 = [30,29,28]
X3 = [25,27]
X = X1+X2+X3
X1bar = np.mean(X1)
X2bar = np.mean(X2)
X3bar = np.mean(X3)
Xbar = np.mean(X)

num = (8-3)*(3*(X1bar-Xbar)**2+3*(X2bar-Xbar)**2+2*(X3bar-Xbar)**2)

den = (3-1)*(3*np.var(X1)+3*np.var(X2)+2*np.var(X3) )

test = num/den

print 'test statistic = %f' %test

print 'rejection threshold = %f' %f.ppf(1-0.05,2,5)

# P-value

print 'p-value = %f' %(1-f.cdf(7.5,2,5))
test statistic = 7.500000
rejection threshold = 5.786135
p-value = 0.031250

Example: WIshart test (polarimetric SAR change detection)

The power received by a radar transmitting/receiving antenna reflected from a distributed (as opposed to point) target a distance $D$ from the antenna is given by

$$ P_R = {P_TG_TG_R\lambda^2\sigma^o\Delta_a\Delta_r\over (4\pi)^3D^4}\ [W], $$

where $P_T$ is the transmitted power $[W\cdot m^{-2}]$, $\lambda$ is the operating wavelength~$[m]$, $G_T(G_R)$ is the transmitting(receiving) antenna gain, $\Delta_a(\Delta_r)$ is the azimuth(ground range) resolution $[m]$ and $\sigma^o$ is the unitless scattering coefficient (referred to as the {\it radar cross section}) of the target surface.

A fully polarimetric SAR measures a $2\times 2$ scattering matrix $\un S$ at each resolution cell on the ground. The scattering matrix relates the incident and the backscattered electric fields $\un E^i$ and $\un E^b$ according to

$$ \un E^b = \un S\un E^i \quad\hbox{or}\quad \pmatrix{E_h^b \cr E_v^b} =\pmatrix{S_{hh} & S_{hv}\cr S_{vh} & S_{vv}}\pmatrix{E_h^i \cr E_v^i}. $$

The per-pixel polarimetric information in the scattering matrix $\un S$, under the assumption of reciprocity ($S_{hv} = S_{vh}$), can be expressed as a three-component complex vector

$$ \un s = \pmatrix{S_{hh}\cr \sqrt{2}S_{hv}\cr S_{vv}}, $$

where the $\sqrt{2}$ ensures that the total intensity (received signal power) is consistent.

The total intensity is referred to as the span,

$$ {\rm span} = \un s^+\un s = |S_{hh}|^2 + 2|S_{hv}|^2 + |S_{vv}|^2, $$

and the corresponding gray-scale image as the {\it span image}.

Another representation of the polarimetric signal is the {\it complex covariance matrix} $\un C$, obtained by taking the complex outer product of $\un s$ with itself:

$$ \un C = \un s\un s^+ = \pmatrix{ |S_{hh}|^2 & \sqrt{2}S_{hh}S_{hv}^* & S_{hh}S_{vv}^* \cr \sqrt{2}S_{hv}S_{hh}^* & 2|S_{hv}|^2 & \sqrt{2}S_{hv}S_{vv}^* \cr S_{vv}S_{hh}^* & \sqrt{2}S_{vv}S_{hv}^* & |S_{vv}|^2 }. $$

The diagonal elements of $\un C$ are real numbers, with span $= {\rm Tr}(\un C)$, and the off-diagonal elements are complex. The covariance matrix representation contains all of the information in the polarized signal.

Multi-look processing essentially corresponds to the averaging of neighborhood pixels, with the objective of reducing speckle and compressing the data. In practice the averaging is often not performed in the spatial domain, but rather in the frequency domain during range/azimuth compression of the received signal. The covariance matrix representation contains all of the information in the polarized signal. It can be averaged over the number of looks to give

$$\eqalign{ \bar{\un C} & ={1\over m}\sum_{\nu=1}^m \un s(\nu)\un s(\nu)^+ = \langle \un s\un s^+ \rangle \cr & = \pmatrix{ \langle |S_{hh}|^2\rangle & \langle\sqrt{2}S_{hh}S_{hv}^*\rangle & \langle S_{hh}S_{vv}^*\rangle \cr \langle\sqrt{2} S_{hv}S_{hh}^*\rangle & \langle 2|S_{hv}|^2\rangle & \langle\sqrt{2}S_{hv}S_{vv}^*\rangle \cr \langle S_{vv}S_{hh}^*\rangle & \langle\sqrt{2}S_{vv}S_{hv}^*\rangle & \langle |S_{vv}|^2\rangle }.} $$

The quantity

$$ m\bar{\un C} = \sum_{\nu=1}^m \un s(\nu)\un s(\nu)^+ =: \un x. $$

is a realization of a complex sample matrix which is known to have a {\it complex Wishart distribution} with $N\times N$ covariance matrix $\bmS$ (here $N=3$) and $m$ degrees of freedom

$$ p_{W_c}(\un x) ={|\un x|^{(m-N)}\exp(-{\rm tr}({\bf\Sigma}^{-1}\un x)) \over \pi^{N(N-1)/2}|{\bf\Sigma}|^{m}\prod_{i=1}^N\Gamma(m+1-i)},\quad m \ge N, $$

provided that the vectors $\un s(\nu)$ are independent and drawn from a complex multivariate normal distribution.

We represent a pixel vector in an $m$ look-averaged polarimetric SAR image in covariance matrix format by $\bar{\un C}$, where

$$ m\bar{\un C} = \un x = \sum_{\nu=1}^m \un s(\nu)\un s(\nu)^+ $$

is a realization of a random matrix $\un X$ with a complex Wishart distribution. If

$$ \un X = \sum_{\nu=1}^m\un Z(\nu)\un Z(\nu)^+,\quad \un Z(\nu) \sim \mathcal{N}_C(\un 0,\bf\Sigma),\ \nu=1\dots m, $$

is complex Wishart distributed with covariance matrix $\bf\Sigma$ and $m$ degrees of freedom, then the maximum likelihood estimate for $\bf\Sigma$ is

$$ \hat{\bf\Sigma} = {1\over m}\sum_{\nu=1}^m\un z(\nu)\un z(\nu)^+ = {1\over m}\un x, $$

so that each pixel $\bar{\un C}$ is the maximum likelihood estimate of $\bf\Sigma$.

To simplify the notation , define

$$ \Gamma_N(m) = \pi^{N(N-1)/2} \prod_{i=1}^N\Gamma(m+1-i). $$

Then, for two $m$-look quad polarimetric covariance images $\un X_1$ and $\un X_2$, the multivariate densities are

$$\eqalign{ p(\un x_1\mid m,{\bf\Sigma_1}) &= { |\un x_1|^{m-3}\exp(-\tr({\bf\Sigma}_1^{-1}\un x_1)) \over |{\bf\Sigma}_1|^m\Gamma_3(m)}\cr p(\un x_2\mid m,{\bf\Sigma_2}) &= { |\un x_2|^{m-3}\exp(-\tr({\bf\Sigma}_2^{-1}\un x_2)) \over |{\bf\Sigma}_2|^m\Gamma_3(m)}. } $$

We now define the null (or no-change) simple hypothesis

$$ H_0:\quad {\bf\Sigma}_1 = {\bf\Sigma}_2 = {\bf\Sigma}, $$

against the alternative composite hypothesis $$ H_1:\quad {\bf\Sigma}_1 \ne {\bf\Sigma}_2. $$

Under $H_0$ the likelihood for ${\bf\Sigma}$ is given by

$$ L({\bf\Sigma}) = p(\un x_1\mid m,{\bf\Sigma})p(\un x_2\mid m,{\bf\Sigma}) = { |\un x_1|^{m-3}|\un x_2|^{m-3}\exp(-\tr({\bf\Sigma}^{-1}(\un x_1+ \un x_2))) \over |{\bf\Sigma}|^{2m}\Gamma_3(m)^2}. $$

The maximum likelihood estimate of ${\bf\Sigma}$ is

$$ \hat{\bf\Sigma} = {1\over 2m}(\un x_1+\un x_2). $$

Hence the maximum likelihood under the null hypothesis is

$$ L(\hat{\bf\Sigma}) = { |\un x_1|^{m-3}|\un x_2|^{m-3}\exp(-2m\cdot\tr(\un I)) \over \left({1\over 2m}\right)^{3\cdot 2m}|\un x_1+\un x_2|^{2m}\Gamma_3(m)^2 }, $$

where $\un I$ is the $3\times 3$ identity matrix and $\tr(\un I)=3$.

Under $H_1$ we obtain, similarly, the maximum likelihood for ${\bf\Sigma}_1$ and ${\bf\Sigma}_2$ as

$$ L(\hat{\bf\Sigma}_1,\hat{\bf\Sigma}_2) = { |\un x_1|^{m-3}|\un x_2|^{m-3}\exp(-2m\cdot\tr(\un I)) \over \left({1\over m}\right)^{3m}\left({1\over m}\right)^{3m} |\un x_1|^m |\un x_2|^m\Gamma_3(m)^2 } $$

Then the Likelihood ratio test has the critical region

$$ Q = {L(\hat{\bf\Sigma}) \over L(\hat{\bf\Sigma}_1,\hat{\bf\Sigma}_2) } = 2^{6m}{ |\un x_1|^m |\un x_2|^m \over |\un x_1 + \un x_2|^{2m} } \le k. $$

For a large number of looks $m$, we can apply the asymptotic approximation described above to obtain the following distribution for the test statistic: As $m\to\infty$, the quantity

$$ -2\log Q = -2m( 6\log 2 + \log|\un x_1| + \log|\un x_2| -2\log|\un x_1+\un x_2| ) $$

is a realization of a chi-square random variable with 9 degrees of freedom, i.e.,

$$ \pr(-2\log Q\le z) \simeq P_{\chi^2;9}(z). $$

This follows from the fact that the number of parameters required to specify a $3\times 3$ complex covariance matrix is 9: three for the real diagonal elements and $2\times 3$ for the three complex elements above the diagonal. Thus the parameter space $\omega$ for $({\bf\Sigma}_1,{\bf\Sigma}_2)$ has dimension $q=2\times 9$, and the subspace $\omega_0$ for the simple null hypothesis has dimension $r=9$, so $q-r=9$.

Multitemporal data

The preceding discussion generalizes in a straightforward way to a time series of $n$ images (Conradsen et al (2016)). In the equation for the test statistic $Q$ above, the numerator consists of a product $k$ determinants $|\un x_1|\cdot|\un x_2|\cdot\dots|\un x_n|$ and the denominator similarly the determinant of the sum of the $n$ observations $|\un x_1+\un x_2+\dots \un x_n|$. The multitemporal test is referred to as the omnibus test will in general be more powerful (have a higher detection probability) for the same significance level than a bitemporal test just involving the first and last images. Furthermore, the test statistic can be factored into a product of test statistics that can be used to ascertain the time or times at which significant changes occur in the series.

In [8]:
run sar_seq -h
Usage:
------------------------------------------------

Sequential change detection for polarimetric SAR images

python sar_seq.py [OPTIONS]  infiles outfile enl

Options:
  
  -h           this help
  -m           run 3x3 median filter on p-values prior to thresholding (e.g. for noisy satellite data)  
  -d  dims     files are to be co-registered to a subset dims = [x0,y0,rows,cols] of the first image, otherwise
               it is assumed that the images are co-registered and have identical spatial dimensions  
  -s  signif   significance level for change detection (default 0.01)

infiles:

  comma-separated list of full paths to input files, no blank spaces: /path/to/infile_1, ... ,/path/to/infile_k
  
outfile:

  without path (will be written to same directory as infile_1)
  
enl:

  equivalent number of looks

-------------------------------------------------
In [9]:
run sar_seq -m '20100426.tif','20100520.tif','20100707.tif','20100731.tif','20100824.tif','20101011.tif' result 12
===============================================
     Multi-temporal SAR Change Detection
===============================================
Wed Oct 26 16:06:13 2016
First (reference) filename:  20100426.tif
number of images: 6
equivalent number of looks: 12.000000
significance level: 0.010000
pre-calculating Rj and p-values ...
attempting parallel calculation ...
Hub connection request timed out 
failed, so running sequential calculation ...
ell=  1 2 3 4 5 
elapsed time for p-value calculation: 33.224725008
most recent change map written to: /home/mort/stats2016/result_cmap
frequency map written to: /home/mort/stats2016/result_fmap
bitemporal map image written to: /home/mort/stats2016/result_bmap
first change map written to: /home/mort/stats2016/result_smap
total elapsed time: 33.334581852
In [10]:
%matplotlib inline
In [11]:
run dispms -f /home/mort/stats2016/result_cmap -c -d [300,300,200,200] \
-F /home/mort/stats2016/20100426.tif -D [300,300,200,200] -o 0.5
In [ ]: